home *** CD-ROM | disk | FTP | other *** search
- Path: mayne.ugrad.cs.ubc.ca!not-for-mail
- From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku)
- Newsgroups: comp.lang.c,comp.unix.programmer
- Subject: Re: Q: '\n' character
- Date: 16 Apr 1996 13:32:33 -0700
- Organization: Computer Science, University of B.C., Vancouver, B.C., Canada
- Message-ID: <4l1051INN7r7@mayne.ugrad.cs.ubc.ca>
- References: <4kj66f$k0o@ren.cei.net> <AD97189A966891F2@mcdiala02.it.luc.edu> <4ktn04INNoev@keats.ugrad.cs.ubc.ca> <4ku8f9$d3o@mark.ucdavis.edu>
- NNTP-Posting-Host: mayne.ugrad.cs.ubc.ca
-
- In article <4ku8f9$d3o@mark.ucdavis.edu>,
- James Knight <knight@quad.cs.ucdavis.edu> wrote:
- >I just bypass all of the problems with fgets using the function below. It handles
-
- OK, here is mine!
-
- I was 30 minutes late for a 8:30 number theory final because of this, so I may
- not graduate.
-
- My ``general'' line reading was tested on various binary and text files,
- including an eight megabyte file full of zeroes, created on the linux system
- with the command ``dd if=/dev/zero of=testfile bs=1024 count=8192''.
-
- The function is configurable with respect to:
-
- - handling of null characters (they are kept by default, or are optionally
- converted to CHAR_MAX characters, or are optionally taken to be alternate
- line terminators equivalent to '\n'.
-
- - handling of memory allocation faults. In the event that a memory allocation
- fails, the routine has the option of either free()ing what it has so far
- and returning a null pointer, or keeping what it has so far (after adding
- proper zero termination as in a successful case).
-
- The function reports, in the form of binary flags, whether:
-
- - EOF was encountered. (if this happens when reading something other than
- the first character of the line, the line is also treated as incomplete).
- - Whether the line was incomplete (this is also reported whenever a null
- character is interpreted as the end of a line). So if EOF is _not_ reported,
- but premature end of line _is_ reported, it means that the line ended with
- a null character rather than a newline (of course, this interpretation of
- null characters must be enabled for this to occur).
- - whether the function ran out of memory, or overflowed the maximum allocation
- size, when reading the line. NULL is returned either if the keep flag
- has not been specified, and always when the very first malloc() fails (since
- then there is no meaningful buffer pointer to revert to).
-
- The empirical performance on a Linux P90/32MB system was: 2.5 seconds of user
- time and 2.5 seconds of system time in reading a single line of 8MB zero
- characters, at about 50% CPU; in reading the same file, but interpreting each
- zero as a newline character, it required 81seconds of user time, two seconds of
- system time, at 93% CPU.
-
- I have checked it against its handling of malloc() failure by artificially
- restricting the data segment size of the process to 1024K (via the ulimit
- built-in command of the GNU Bourne-Again Shell), and had it try to read an
- eight megabyte long line. It correctly returned NULL when partial keeping was
- not specified, and returned about 800-900K when partial keeping in the event of
- allocation failure _was_ specified.
-
- I have also tested all kinds of cases, and all the options in combination with
- malloc failure. One test I did with the sample harness program was to filter
- the operating system kernel through it. A file comparison program indicated only
- the difference that an extra newline was added in the destination (which could
- have been taken care of in the test program by checking for an incomplete
- line).
-
- The only feature that is untested is the overflow check for computing a larger
- buffer size, since this may exceed size_t for a sufficiently large file size.
- (ANSI guarantees a maximum size_t of only 32767).
-
- BUGS:
-
- - when the program is unable to resize the buffer to the actual bytes that
- were read, it gives up and returns the oversized buffer (but gives the correct
- line length nevertheless). This shouldn't happen in pratice, with a sound
- realloc() implementation.
-
- - input/output flags could be combined into a single parameter, as with
- the select() system call, for example.
-
- --------
-
- The freadln.h header file:
-
- typedef enum fr_inflg {
- FR_NOI = 0x00, /* no input flags */
- FR_CONV = 0x01, /* convert null chars to 127 */
- FR_NL = 0x02, /* treat nulls as newlines */
- FR_KEEP = 0x04 /* if NOMEM, keep partial line */
- } fr_inflg;
-
- typedef enum fr_outflg {
- FR_NOO = 0x00, /* no output flags */
- FR_EOF = 0x01, /* EOF was encountered */
- FR_NONL = 0x02, /* line did not end in newline */
- FR_NOMEM = 0x04, /* out of memory */
- FR_OFLOW = 0x08 /* allocation size_t overflow */
- } fr_outflg;
-
- char *freadln(const fr_inflg inf, fr_outflg *ouf, size_t *len, FILE *stream);
-
-
- --------
-
- The freadln.c implementation:
-
- #include <stdio.h>
- #include <stdlib.h>
- #include <limits.h>
-
- #include "freadln.h"
-
- #define INIT_SIZE1 55 /* Fibonacci numbers */
- #define INIT_SIZE2 89
-
- char *freadln(const fr_inflg inf, fr_outflg *ouf, size_t *len, FILE *stream)
-
- {
- size_t oldsize = INIT_SIZE1, cursize = INIT_SIZE2, newsize;
- char *line, *pline, *limit, *new;
-
- *ouf = FR_NOO;
-
- pline = line = malloc(cursize);
-
- if (!line) {
- *ouf = FR_NOMEM;
- goto failure;
- }
-
- limit = line + cursize - 2; /* guarantee room for null */
-
- while (1) {
- int c = getc(stream);
-
- switch(c) {
- case '\0':
- if (inf & FR_NL) /* nulls terminate */
- goto badend;
- if (inf & FR_CONV) { /* nulls get replaced */
- c = CHAR_MAX;
- goto addchar;
- }
- goto addchar;
- break;
- case EOF:
- *ouf = FR_EOF;
- if (pline == line) /* first character? */
- goto success;
- badend: /* line end w/o newline */
- *ouf |= FR_NONL;
- case '\n':
- goto success; /* jump out of loop */
- addchar:
- default:
- *pline++ = c;
- break;
- }
-
- if (pline >= limit) { /* if buffer is full... */
- newsize = cursize + oldsize;
- if (newsize < cursize) { /* overflow! */
- *ouf = FR_OFLOW;
- if (inf & FR_KEEP)
- goto success;
- free(line);
- goto failure;
- }
- new = realloc(line, newsize);
- if (!new) {
- *ouf = FR_NOMEM;
- if (inf & FR_KEEP)
- goto success;
- free(line);
- goto failure;
- }
- oldsize = cursize;
- cursize = newsize;
- pline = new + (pline - line);
- line = new;
- limit = line + cursize - 2;
- }
- }
-
- success:
- *pline++ = '\0'; /* null-terminate */
- *len = pline - line - 1; /* calculate line length */
- new = realloc(line, *len + 1); /* try to trim buffer down */
- return (new) ? new : line; /* if cannot, ah well... */
-
- failure:
- return NULL;
- }
-
-
-
- --------
-
- Sample main ``test harness'' filter:
-
- #include <stdio.h>
- #include <stdlib.h>
- #include "freadln.h"
-
- int main(int argc, char **argv)
-
- {
- fr_inflg inf, ouf;
- size_t len, maxlen = 0;
- char *line;
-
- if ((++argv, --argc) && *argv)
- inf = atoi(*argv);
- else
- inf = FR_NOI;
-
- while ((line = freadln(inf, &ouf, &len, stdin))) {
- fwrite(line, 1, len, stdout);
- free(line);
- if (len > maxlen)
- maxlen = len;
- putchar('\n');
- if (ouf & (FR_NOMEM|FR_OFLOW)) {
- fprintf(stderr,"ouf == %d, maxlen == %d\n",ouf,maxlen);
- exit(EXIT_FAILURE);
- }
- if (ouf & FR_EOF) {
- fprintf(stderr,"maxlen == %d\n",maxlen);
- exit(EXIT_SUCCESS);
- }
- }
-
- fprintf(stderr,"line === NULL, maxlen = %d\n",maxlen);
- return EXIT_FAILURE;
- }
- ~
-
- --
- I'm not really a jerk, but I play one on Usenet.
-